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since  the  June  Ir  1980  starting  date  of  our  contract  with  the 
U,  S.  Army  Research  Office  (DAAG29-80-K-0006) ,  significant  progress 
has  been  made  in  the  areas  of  proposed  research,  we  believe  that 
the  power  and  versatility  of  Grenander's  Method  of  Sieves  as  a 
general  approach  to  nonparametric  estimation  is  now  well 
established.  Depending  on  the  chosen  sieve^  it  offers  an  array  of 
new  approaches  in  specific  applications  and  it  frequently 
contributes  to  our  understanding  of  well-studied  estimators  and 
suggests  improvements  of  them.  Although  our  theoretical 
understanding  is  not  complete,  we  now  have  a  firm  mathematical  and 
intuitive  understanding  of  the  method.  In  the  past  year  and  a  half 
we  have  been  increasingly  supplementing  our  analytic  approach  with 
more  detailed  explorations  of  specific  practical  applications. 

In  Section  II  we  shall  review  the  essential  ideas  behind  the 
method  of  sieves  and  lay  the  foundations  of  the  problems  that  have 
guided  our  research  during  the  contract  period.  Section  III 
summarizes  our  progress  and  highlights  concrete  results  that  have 
been  obtained.  The  summary  identifies  four  broad  areas  where 
progress  has  been  made:  the  general  theory  of  sieves  for 
nonparametric  estimation,  theoretical  and  methodological  aspects  of 
selecting  sieve  parameters  (cross-validation) ,  studies  of  specific 
method-of-sieve  estimators,  and  methodology  and  development  of 
reources  for  computer  experiments.  Section  IV  describes  projects 
that  are  currently  being  pursued,  including  manuscripts  in  various 
stages  of  preparation  and  collaborative  efforts  on  applications  of 


6 


nonpar ametric  estimation  theory  and  the  method  of  sieves  to  digital 
picture  processing.  Image  processing  problems  of  segmentation, 
registration,  and  reconstruction  have  proven  to  be  a  fertile  source 
of  interesting  and  important  theoretical  and  applied  problems  in 
nonparametric  estimation.  Section  V  lists  the  technical  reports  and 
papers  that  document  progress  on  specific  research  problems  in  our 
project.  Section  VI  acknowledges  the  personnel  who  have  contributed 
to  the  effort. 


II .  The  Method  of  Sieves^ 

Techniques  for  estimating  finite  dimensional  parameters 
typically  fail  when  applied  to  infinite  dimensional  problems. 

The  difficulties  encountered  in  moving  from  finite  to  infinite 
dimensions  are  well  illustrated  by  the  failure  of  maximum 
likelihood  in  nonparametric  density  estimation.  Let 
be  an  iid  sample  from  an  absolutely  continuous  distribution  with 
unknown  probability  density  function  (pdf)  QqCx).  The  maximum 
likelihood  estimator  for  Oq  maximizes 

Cl)  Cl  aCXj) 

over  some  specified  set  of  candidates.  But  if  this  set  is  too 
large,  then  the  method  will  fail  to  produce-  a  meaningful  estimator. 

Much  of  the  material  in  this  section  appeared  in  a  paper  by 
Geman  and  Hwang  [19].  It  is  included  here  for  the  purpose  of 
making  this  proposal,  as  nearly  as  possible,  self-contained. 
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For  instance,  in  the  extreme  case  nothing  is  known  about  Oq, 
and  the  maximum  of  (1)  is  not  achieved.  Roughly  speaking,  we 
move  out  of  the  parameter  space  (the  space  of  all  densities) 
toward  a  discrete  distribution  with  jumps  at  the  sample  points. 

Another  example  of  the  failure  of  classical  methods  to  solve 
infinite  dimensional  problems  is  the  breakdown  of  least  squares 
in  the  nonparametric  estimation  of  a  regression.  Let  X  and  Y  be 

random  variables  and  let  (Xj^,yj^) . ^^n’^n^  sample 

from  the  bivariate  distribution  of  (X,Y).  The  least  squares 
estimator  of  the  regression  function  E(Y|X=x)  minimizes 

(2) 

Observe  that  the  minimum  is  zero  and  is  achieved  by  any  function 

which  passes  through  all  of  the  points  of  observation 

(Xf ,yi) , . . . ,  (Xn,yn) •  Excepting  some  very  special  cases,  this 

set  does  not  in  any  meaningful  sense  converge  to  the  true  regression. 

Grenander  (24]  suggests  the  following  remedy:  perform  the 
optimization  (maximization  of  the  likelihood,  minimization  of  the 
sum  of  square  errors,  etc.)  within  a  subset  of  the  parameter  space, 
and  then  allow  this  subset  to  "grow"  with  the  sample  size.  He 
calls  this  collection  of  subsets  from  which  the  estimator  is 
drawn  a  "sieve,"  and  the  resulting  estimation  procedure  is  his 
"method  of  sieves."  The  method  leads  easily  to  consistent 
nonparametric  estimators  in  even  the  most  general  settings,  with 
different  sieves  giving  rise  to  different  estimators.  Often  the 
sieve  estimator  is  closely  related  to  an  already  well-studied 
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estimator,  and  may  suggest  an  improvement,  or  a  new  point  of 
view  and  a  new  motivation. 

The  histogram  is  a  simple  example  of  a  sieve  estimator. 
Consider  again  the  problem  of  estimating  an  entirely  unknown 
density  function  a^Cx).  We  have  seen  that  unmodified  maximum 
likelihood  is  not  consistent  for  this  problem.  A  sieve  is  an 
indexed  collection  of  subsets  of  the  parameter  space,  such  as: 

k  - 1  k 

=  {a:  is  a  pdf  which  is  constant  on 

k=0,±l,±2, . .  .  } 

X  >  0,  The  method  of  sieves  estimator  maximizes  the  likelihood 
n”_j^a(x^),  subject  to  aCS^,  allowing  X  to  grow  w'ith  the  sample 
size.  The  well-known  solution  is  the  function 

S(x)  .  i  Mx^:  ><^<  xj  <  |}  for  x€[i^,|), 

i.e.  the  histogram  with  bin  width  X'^.  Putting  aside  details, 
we  know  that  if  X  +  ^  sufficiently  slowlv,  then  a  is  consistent, 
e.g.  in  the  sense  that  |  a  (x) -Uq  (x)  ]  dx  -►  0  a.s. 

For  the  same  problem,  a  different  and  more  interesting  sieve 
is  the  "convolution  sieve": 

X^ ,  .2 

X  ■ 

S.  *  {a:a(x)  =  -  e  F(dv),  F  an  arbitrarv  cdf}. 

This  time,  maximizing  the  likelihood  within  gives  rise  to  an 
estimator  closely  related  (but  not  identical)  to  the  Parzen- 
Rosenblatt  (Gaussian)  kernel  estimator.  In  fact,  the  latter 
is  in  the  sieve  :  take  F  to  be  the  empirical  distribution 
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function.  But  the  maximum  of  the  likelihood  is  achieved  by 
using  a  different  distribution.  As  with  the  Parzen-Rosenblatt 
estimator,  if  +  «>  sufficiently  slowly  (i.e.  the  "window  width" 
is  decreased  sufficiently  slowly)  then  the  estimator  is  consistent. 
A  more  precise  discussion  of  this  and  some  related  sieves  is  in 
section  III. 

The  inconsistency  of  least  squares  nonparametric  regression 
can  be  similarly  rectified  by  introducing  sieves.  Let  us  look 
again  at  the  regression  problem  formulated  above;  recall  that 
(Xi ,yi) , . . . , (Xn ,yn)  is  an  iid  sample  from  the  bivariate 
distribution  of  (X,Y).  Given  a  sieve  ,  the  method  of  sieves 
estimator  a  minimizes  the  sum  of  square  errors,  (2),  subject 
to  aeS^.  If,  as  an  example. 


S,  =  {a:a  absolutely  continuous. 


id 


a (x) I ^dx  <  X } 


then  a  is  uniquely  determined;  it  is  a  first  degree  polynomial 
smoothing  spline;  i.e.  a  is  continuous  and  piecewise  linear  with 
discontinuities  in  (d/dx)a  at  Xj^,...,x^;  see  Schoenberg  [41]. 

It  is  possible  to  show  that  if  X^  increases  sufficiently  slowly, 
then  the  estimator  is  strongly  consistent  for  E(y|X=x)  in  a 
suitable  metric;  details  are  in  Geman  [18].  Other  sieves  applied 
to  the  same  problem  lead  to  kernel  estimators  and  still  others 
to  new  estimators.  Even  if  the  squared  loss  function  {y-a(x)}“ 
is  replaced  by  a  "robust"  alternative,  minimization  over  too 
large  a  set  will  again  fail  to  produce  a  meaningful  estimator. 

In  exa'''  iy  the  same  way,  sieves  offer  a  remedy  in  this  case  as  well. 
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III.  Progress  During  Current  Contract  Period 

We  will  describe  in  detail  our  progress  during  the  current 
contract  period.  Most  of  what  we  will  say  is  included  in 
finished,  and  in  some  cases  already  published,  manuscripts.  In 
addition  to  making  our  presentation  self-contained,  the  details 
included  here  will  make  our  later  discussion  of  proposed  research 
much  clearer. 

A.  A  General  Consistency  Argument  for  Method  of  Sieves 
Estimators . 

We  began  our  work  on  the  method  of  sieves  by  formulating 
theorems  establishing  the  consistency  of  sieve  estimators 
derived  from  the  maximum  likelihood  procedure.  For  the  problems 
addressed,  these  theorems  hold  in  the  most  general  possible 
settings.  In  our  original  proposal  we  expressed  our  intention 
to  extend  these  theorems  to  apply  to  a  wider  variety  of  problems, 
including  other  criterion  functions  (such  as  least  squares, 
robust  loss  functions,  etc.)  and  including  observations,  such  as 
continuous  time  stochastic  processes,  which  are  not  naturally 
indexed  by  a  discrete  parameter.  We  found,  however,  that  this 
greater  scope  led  to  theorems  whose  application  required  the 
verification  of  numerous  complicated  conditions. 

The  consistency  problem  for  the  method  of  sieves  is 
essentially  one  of  identifying  an  appropriate  upper  bound  on 
the  rate  of  growth  for  the  sieve.  Instead  of  a  general  theorem 
for  this  identification,  we  have  developed  a  general  approach. 
What  we  have  is  a  versatile  strategy  for  determining  bounds  on 
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growth  rates  of  sieves,  and  it  easily  achieves  all  of  the  desired 
extensions.  Whereas  a  rigorous  explanation  of  the  technique  would 
be  unnecessarily  involved,  we  can  briefly  describe  it  in  a 
heuristic  manner  and  refer  to  several  papers  for  details  of  its 
application. 

Our  approach  to  establishing  consistency  for  sieves  is  an 
adaptation  of  methods  first  developed  by  Bahadur  [5],  Wald  [56], 
and  others  in  similar  connection  with  extensions  of  the  maximum 
likelihood  procedure.  It  might  be  called  the  "small-ball 
technique",  as  it  consists  of  partitioning  the  sieve  into  balls 
sufficiently  small  so  that  all  estimators  within  each  ball  behave 
similarly.  One  first  demonstrates  that  when  using  the  estimator 
at  the  center  of  one  of  these  balls  the  value  of  the  criterion 
function  (log-likelihood,  sum  of  square  errors,  etc.)  is  well- 
approximated  by  its  expected  value,  at  least  with  high  probability. 
If  this  approximation  can  be  made  to  hold  uniformly  over  all 
centers,  then  it  will  also  hold  over  the  entire  sieve  -  assuming, 
again,  that  the  balls  are  sufficiently  small.  As  the  sieve  grows, 
the  balls  are  made  smaller  and  more  numerous.  If  the  sieve 
growth  is  sufficiently  slow  then  a  "uniform  laAv  of  large  numbers" 
can  be  established:  optimizing  the  criterion  function  over  the 
sieve  is  asymptotically  equivalent  to  optimizing  its  expected 
value.  The  point  is  that  the  latter  optimization  typically 
defines  the  target  parameter.  As  an  example  let  Oq  be  an  unknown 
density  function,  and  consider  the  likelihood  crtierion:  the 
maximization  of 
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over  “ C  is  asymptotically  equivalent  to  the  maximization  of 
1  ” 

E  i  log  a(x.)  =  a„(x)log  ct(x)dx 
^1=1  1  J  u 

provided  the  sieve  growth  is  sufficiently  slow  (>  grows  slowly 
with  n).  It  is  easily  demonstrated  that 

aQ(x)log  a(x)dx 


achieves  its  maximum  at  a  =  a^,  the  unkno\\n  density.  N'onparametr ic 
regression  by  least  squares  leads  to  the  problem  of  minimizing 


1 

n 


(y . -ot(xp)" 


over  a  sieve  a  €  (we  use  the  notation  introduced  in  section  II). 
Provided  that  the  sieve  growth  is  sufficiently  slow,  this  is 
asymptotically  equivalent  to  the  minimization  of 


J  (y.-a(x.)) 
i=l  ^  ^ 


E(Y-a(X))^ 


For  the  latter,  a  minimum  is  achieved  by  the  regression 
a(x)  =  E{y1X=x},  i.e.  by  the  target  parameter.  For  robust  loss 
functions,  the  minimizer  of  the  expected  criterion  function  may 
or  may  not  be  the  target  parameter,  depending  on  the  true  under¬ 
lying  distribution.  Hence,  sieve  estimators  derived  from  robust 
loss  functions  have  an  added  condition  for  consistency. 

By  now  we  have  numerous  examples  of  the  application  of 
this  technique  to  problems  of  density  estimation,  regression 
(with  and  without  robust  loss  functions),  the  estimation  of  convex 
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sets  from  noisy  one  dimensional  projections  (a  problem  arising 
in  emission  tomography),  the  estimation  of  images  of  conformal 
mappings  from  noisy  boundary  observations,  and  the  estimation 
of  the  mean  and  drift  functions  for  various  stochastic  processes. 

For  the  latter,  we  are  able  to  exend  our  original  results  from 
the  artificial  restriction  of  using  only  partial  and  discrete 
observations  of  a  continuous  time  process  to  the  more  natural 
and  no  doubt  more  efficient  use  of  the  entire  continuous  time 
development  of  the  process.  Details  on  some  of  these  applications 
of  the  ”small-ball  technique"  can  be  found  in  the  references 
[  9  ]  ,  [17]  ,  [18]  ,  [19]  ,  and  [  3S]  . 

B .  Cr'oss-val  idated  Smoothing  . 

In  our  original  proposal  we  indicated  that  consistency 
results  for  estimators  smoothed  by  cross-validation  would  be 
given  the  highest  priority  in  our  research  program.  In  this 
section,  we  will  rev’iew  the  role  of  cross-validation  in 
nonparametric  estimation,  and  we  will  summarize  our  progress 
in  establishing  properties  of  estimators  smoothed  by  this  technique. 

1 .  The  Smoothing  Problem 

The  practical  application  of  nonparametric  (infinite 
dimensional)  estimators  requires  the  choice  of  a  "smoothing 
parameter".  For  estimators  generated  by  the  Method  of  Sieves, 
we  can  specify  as  a  function  of  sample  size  an  upper  bound  on 
the  rate  of  growth  of  the  sieve  which  will  guarantee  consistency, 
but  this  rate  tells  us  very  little  about  the  practical  choice 
of  a  sieve  size  when  faced  with  a  finite  fixed  sample.  Too 
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large  a  sieve  produces  an  undersmoothed ,  or  "overfit", 
estimator,  and  too  small  a  sieve  produces  an  estimator  that 
is  little  influenced  by  the  observations  -  an  oversmoothed 
estimator.  The  proper  choice  of  sieve  size  for  finite  fixed 
samples  is  the  smoothing  problem  for  the  method  of  sieves,  and 
it  has  its  analogue  for  all  non-Bayesian  estimators  of  infinite 
dimensional  parameters.  Thus  the  histogram  requires  the  choice 
of  a  bin-width;  the  kernel  estimator  requires  the  choice  of  a 
window  width;  the  maximum  penalized  likelihood  estimator  ([22], 
[51])  requires  that  a  weight  be  assigned  the  penalty  term;  and 
orthogonal  series  estimators  must  be  suitably  truncated  [29], 
or  "band  limited"  [54].  Regarding  kernel  estimators,  Silverman 
[46]  observes  that  "there  seems  to  be  considerable  need  for 
objective  methods  of  determining  the  window  width  appropriate 
to  a  given  sample".  Speaking  more  generally,  Wahba  [54]  remarks: 
"A  major  problem  in  density  estimation  is  to  choose  the  smoothing 
parameter  (s) ,  which  are  part  of  every  density  estimate...".  And 
the  problem  is  not  peculiar  to  density  estimation.  Splines, 
kernels,  and  the  newer  "recursive  partitions"  (see,  for  example, 
[23])  for  nonparametric  regression,  all  require  first  a  version 
of  smoothing  to  be  fully  defined. 

To  illustrate  the  problem  more  concretely,  in  the  context 
of  the  method  of  sieves,  let  us  return  to  the  convolution  sieve 
for  density  estimation  introduced  earlier  in  section  II: 

f  .2 

S,  =  {a:a(x)  =  -  e  F{dv),  F  an  arbitrary  cdf). 

^  J/I? 
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(This  example  also  gives  us  an  opportunity  to  present  some 
additional  work  done  during  the  current  contract  period.) 
Recall  our  description  of  the  associated  maximum  likelihood 
estimator  as  being  closely  related  to  the  Parzen-Rosenblatt 
kernel  estimator.  More  specifically  (see  [20]  for  details) 


Proposition^  For  each  n  and  X  define 


=  {oi€S,  :  n  a(x.)  sun  II  6(x.)} 

^  ^  i=l  ^  665^  i=l  ^ 

=  set  of  maximum  likelihood  solutions.  For  every  n  and  X 
is  nonempty,  and  if  ^  then 

X^  2 

?  X  ' 

a(x)  =  I  Pi  -  e 

i=l  /7¥ 


n 


n 


for  some  yi*****/^  Pl’'*'Pn  satisfying  p^  >  0,  1  £  i  <  n, 
n 

y  p-  =  1.  Furthermore,  if  min(Xi,...x  )  <  max(x,,...x  )  then 

^1  n  1  n^ 

min  (Xj  , .  .  .  x^)  <  min  (y^^ , .  .  .  y^)  and  max  (y^^ , .  .  .  y^^)  <  max  (x ^  , .  .  .  x^^)  , 

It  is  interesting  to  note  that  the  kernel  estimator  (with 
Gaussian  kernel) 

X^ ,  .2 

1  ?  X  ■•^Cx-x.) 

i  I  ' 

i  =  l  /Z¥ 

is  in  ,  but  the  last  statement  in  the  proposition  indicates 
that  6  is  not  among  the  maximum  likelihood  solutions,  i.e. 


^  If  in  the  definition  of  the  sieve  the  Gaussian  kernel  is 

replaced  by  a  double  exponential,  then  a  more  complete 
characterization  of  the  maximum  likelihood  solution  is  available. 
This  characterization  was  presented  recently  by  Blum  and  Walter 
[6]  in  a  paper  that  also  makes  an  interesting  connection 
between  the  convolution  sieve  and  other  methods  for  nonparametr ic 
density  estimation. 
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6  ♦  mJ. 


1 


Although  we  have  characgerized  the  maximum  likelihood 
set  up  to  the  2n  parameters  its  actual 

computation  is  difficult.  The  proposition  suggests  a  smaller 
and  computationally  more  attractive  sieve: 

7 


n 


=  {a:a(x)  =  —  T  — —  e 
1  =  1  /27r 


(x-Vi) 


i.e.  we  give  equal  mass  to  each  kernel,  but  allow  the  locations 
to  move  in  such  a  way  as  to  maximize  the  likelihood.  (Here 
again,  it  can  be  shown  that  the  kernel  estimator  is  not  among 
the  maximum  likelihood  solutions.)  We  have  experimented  with 
and  have  found,  as  a  rule,  that  the  number  of  distinct  y's  in 
a  maximum  likelihood  solution  is  considerably  smaller  than  n. 

In  other  words,  the  kernels  will  often  coalesce  to  achieve  an 
increased  likelihood.  Sometimes  this  results  in  strikingly 
accurate  density  estimators,  while  at  other  times  this  "maximum 
likelihood”  solution  is  a  poor  second  to  the  corresponding 
(same  window  width)  kernel  estimator.  In  either  case,  this 
estimator  suffers  the  very  same  stability  problem  as  the  kernel 
estimator:  the  results  are  critically  dependent  on  the  choice 

of  the  kernel  width  (which  is  here  governed  by  the  sieve  parameter 
X).  The  important  choice  of  a  good  X  is  the  smoothing  problem 
for  this  estimator. 


Again  referring  to  the  paper  by  Blum  and  Walter,  the  surprising 
result  for  double  exponential  kernels  is  that  the  kernels  are 
centered  at  rhe  observations.  However,  the  classic  kernel 
estimator  is  still,  in  general,  not  the  maximum  likelihood 
solution. 
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One  approach  to  this  critical  dependence  on  window  width 

(call  it  a)  is  to  include  o  as  a  free  parameter  within  the 

sieve,  and  thus  allow  it  to  be  chosen  by  maximum  likelihood. 

But  we  must  be  somewhat  careful;  we  cannot  merely  replace  S,  bv 

1  ? 

(x-yi)*- 


{a:a(x)  =  ^  I  — — 


2a‘ 


}. 


since  then  the  maximum  of  the  likelihood  is  achieved  with  o=0 
and  the  kernels  centered  at  the  sample  points.  Let  us  instead 
define  the  sieve  parameter  \  to  be  the  number  of  kernels, 
restricting  this  to  be  smaller  than  n  (A  <  n) ,  and  consider 


S,  =  {a;a(x)  = 


1 


1 

2a‘ 


(x-y.)' 


The  associated  maximum  likelihood  estimator  has  performed  well 

in  our  simulations,  and  we  can  demonstrate  (by  an  application 

of  the  "small-ball  technique")  that  if  A  -*■  «>  in  such  a  wav  that 

-I  y  n 

An  =  0(n^^^  ^)  for  some  e  >  0  then 


sup 


a(x)-aQ(x)  |dx  -►  0 


a .  s . 


n 

where  is  the  maximum  likelihood  set  associated  with  the  sieve 
S^,  and  Oq  is  the  true  density.  But  for  fixed  sample  the  degree 
of  smoothing  is  still  important:  with  moderate  sample  sizes 
(n  =  50),  A=1  generally  oversmooths  and  A=n-1  will  almost  always 
drastically  undersmooth. 
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Among  the  important  practical  questions  about  sieves  that 
remain  unanswered  (including  relative  efficiency,  asymptotic 
distributions,  good  sieves  for  robust  estimation,  etc.),  perhaps 
most  pressing  is  this  smoothing  problem.  Probably  the  most 
broadly  applicable  and  widely  studied  of  the  general  solutions 
proposed  to  the  smoothing  problem  is  the  method  of  cross- 
validation  (for  an  introduction,  see  Stone  [50]  and  Kahba  [54] 
and  the  many  references  therein).  We  have  experimented 
extensively  with  the  method  and  have  found  what  many  others  have 
found  (see  [44 ]  ,  [53 ]  ,  [54 ]  ,  and  [55]  to  name  just  a  few):  that 
cross-validation  is  often  times  a  strikingly  effective  means  of 
choosing  an  appropriate  degree  of  smoothing.  But  success  with 
the  method  is  not  guaranteed.  An  example  of  its  failure  on  a 
seemingly  canonical  problem  is  due  to  Schuster  and  Gregory  [42]. 
They  prove  that  cross -val idated  kernel  estimators  with  compact 
kernels  are  inconsistent  for  the  exponential  distribution.  They 
also  demonstrate  by  simulation  that  these  estimators  have  poor 
small  sample  behavior.  Other  authors  have  raised  doubts  about 
the  effectiveness  of  cross-validation  for  choosing  the  smoothing 
parameter  in  ridge  regression  (see,  e.g.  [14],  [30]). 

Since  there  is  very  little  known  analytically  about  cross- 
validated  estimators,  and  since  there  seems  to  be  a  real  practical 
need  for  a  better  understanding  of  the  method  (especially  in 
identifying  problems  to  which  it  can  be  successfully  applied), 
and  since  the  method  offers  a  natural  solution  to  the  smoothing 
problem  for  the  Method  of  Sieves,  we  have  spent  considerable  time 
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in  a  mathematical  study  of  cross-validation.  Subsection  2 
below  is  a  brief  introduction  to  the  method,  and  subsection  3 
presents  the  analytic  results  about  cross-validation  that  we 
have  obtained  during  the  current  contract  period. 


2 .  Smoothing  by  Cross-validation 

The  general  idea  behind  data-driven  smoothing  is  to 

measure,  as  a  function  of  the  smoothing  parameter  (call  it  X), 

the  ability  of  the  estimator  to  "explain",  or  to  "fit",  the 

observed  data.  X  is  chosen  to  maximize  this  measure  of 

explanation.  When  smoothing  by  cross-validation,  in  particular, 

the  measure  of  explanation  is  obtained  by  successively  deleting 

single  observations,  computing  the  estimator  from  the  remaining 

observations,  and  then  applying  the  estimator  to  the  deleted 

observation.  The  details  are  most  easily  illustrated  with 

specific  examples.  For  this  purpose,  we  will  use  the  kernel 

estimator,  call  if  f, 

A  j  ri 

'  B  Tj  XK(X(x-xp) 

for  some  probability  kernel  K.  We  will  denote  by  fj  ^  the 
estimator  computed  after  deleting  the  i'th  observation,  i.e. 


XK(X (x-Xj  ) )  . 


Now  f^  ,  is  not  dependent  on  x.,  and  f?  -.(x.)  may  be  taken 
X ,n-l  1  X  ,n-l  1 

as  a  measure  of  the  appropriateness  of  X  as  a  value  for  the 
smoothing  parameter:  If  f^  large,  then  it  might  be 

A  •  n  *  A  1 
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said  that  ,  "anticipated"  the  observation  x.,  and  that  > 

is  an  appropriate  degree  of  smoothing  (at  least  for  samples  of 
size  n-1);  small  values  of  f^  i(x.)  suggest  that  the  observa- 
tion  was  unlikely  (under  the  density  f^  ^  ^),  and  may  be 
interpreted  as  evidence  against  the  appropriateness  of  A.  As  i 
ranges  through  the  full  sample  we  obtain  n  such  measures  of  fit, 
and  these  may  be  combined  into  a  likelihood-like  expression: 


"  i 

n  f,  t(x.) 
X,n-1'-  i-* 


One  version  of  cross-val idated  density  estimation  (first  proposed 
by  Habbema  et  al .  [26],  and  separately  by  Duin  [12])  chooses  >. 
to  maximize  (call  this  value  X*,  the  "cross -val idated  smoothing 
parameter"),  and  then  forms  the  corresponding  estimator,  f 
(the  "cross-validated  kernel  estimator"). 

If,  instead  of  the  kernel  estimator,  f,  _  is  defined  by 

A  ^  n 


(x)  =  A  T 


.  (x){  I 
■’X^  i  =  l 


(a  histogram  with  bin  width  1/X,  and  an  instance  of  the  Method 
of  Sieves),  then  exactly  the  same  procedure  defines  a  cross- 
validated  smoothing  parameter,  and  a  resulting  cross -val idated 
estimator,  f^  could  in  fact  be  any  density  estimator  in  which  ' 
represents  the  degree  of  smoothing  (possibly,  X  is  vector  valuedl. 
For  example,  X  may  be  the  parameter  indexing  the  sieve,  in  which 
case  cross-validation  provides  a  fully  data-defined  Method  of 
of  Sieves  estimator.  And  the  setting  is  not  limited  to  density 


A. 
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estimation;  cross-validation  applies  to  regression  (with,  for 
example,  the  "1 ikel ihood - 1 ike”  expression  (3)  replaced  by  an 
analogous  "sum  of  squared  errors-like" expression)  and  many  other 
estimation  problems  as  well.  Whatever  the  setting,  cross- 
validation  can  be  used  to  automatically  smooth  an  estimator 
derived  by  the  Method  of  Sieves. 


3.  Results  about  Cross-val idated  Estimator 


We  have  studied  the  application  of  cross-validated 
estimators  to  both  infinite  and  finite  dimensional  problems. 

In  all  cases,  our  motivation  has  been  the  usually  good  behavior 
of  these  estimators  for  small  samples,  as  demonstrated  by 
computer  experiment  (see  subsection  D  below). 

Our  results  on  infinite  dimensional  estimation  problems 
concern  the  consistency  of  cross-validated  kernels  and  histograms 
for  nonparametr ic  density  estimation.  For  details  we  refer  to 


[  ]  -  but  here,  loosely  stated,  are  the  results:  let  f,  be 

A  ^  D 

either  the  histogram  estimator  with  bin  width  l/\  or  the  kernel 

A 

estimator  with  a  compact  kernel  and  window  width  1/X.  Let  be 

the  corresponding  cross-validated  smoothing  parameter,  given  n 

* 

observations  (see  previous  subsection  for  the  definition  of  ■'^). 


If  f,  the  target  density,  has  compact  support  then 
strongly  consistent  in  the  metric,  i.e. 


i  5 


(x)-f(x)|dx  0 


a .  s . 
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Most  instances  of  Method  of  Sieve  estimators  are  only 
implicitly  defined  -  as  the  solution  of  an  optimization 
problem  over  a  subset  of  the  parameter  space.  The  availability 
of  an  explicit  representation  for  histograms  and  kernel 
estimators  led  us  to  believe  that  cross-validated  versions 
of  these  estimators  would  be  relatively  easy  to  analyze.  In 
fact,  they  were  not.  The  proof  of  the  above-stated  consistency 
theorems  is  long  and  difficult.  And  it  is  probably  true  more 
generally  that  analysis  of  nonparametric  estimators  smoothed 
by  data-driven  techniques  is  extremely  hard. 

Some  estimates  for  finite  dimensional  (parametric)  problems 
also  contain  unspecified  smoothing  parameters,  and  these  too  can 
be  data-defined  by  cross-validation.  With  the  intention  of 
using  these  more  elementary  examples  to  learn  about  cross-valida¬ 
tion  in  general,  and  because  finite  dimensional  applications  of 
cross-validation  are  interesting  in  their  own  right,  we  have 
begun  a  mathematical  study  of  such  estimators.  Consider,  for 
example,  the  linear  regression  problem: 

^i  ”  ^il®l^  ■  ■  ■  ^^ip^p^^i '  ^  1  ^  1  ^  N(0,o"). 

Or,  in  vector-matrix  notation: 

Y  =  XB+e  e  ~  N(0,a^I). 

The  least  squares  (maximum  likelihood)  estimator  for  B  is 

6  =  (XX)  ^X  Y. 

The  ridge  estimator  for  B  is 
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=  (x'^X+nXI)  ■^x’^Y  X  >  0. 

Observe  that  6q  is  the  least  squares  estimator.  The  introduction 
of  X  into  the  least  squares  estimator  may  be  motivated  by  any 
of  the  following  considerations.  (1)  6^  minimizes 

||Y-X6||^  +  nXllBlI^ 

A 

Hence  6^  may  be  viewed  as  a  penalized  least  squares  estimator, 

with  penalty  for  large  values  of  6.  (2)  When  X  a  is  "nearly 

singular"  (poorly  conditioned)  Bp  has  large  MSE  due  to  the  fact 

T 

that  the  inverse  of  X  X  is  involved  in  its  derivation.  Adding 
T 

nXI  to  X  X  improves  the  conditioning  and  may  be  expected  to 

reduce  MSE.  (3)  Perhaps  the  best  justification  for  ridge 

regression  is  the  following  easily  demonstrated  fact;  for 

2 

every  n,  B,  and  a  >  0,  there  exists  a  X  >  0  such  that 
E||6-Sj|^  <  EllB-Bpll^ 


Unfortunately,  the  optimal  X  (in  terms  of  MSE)  depends  on  g 
2 

and  o  ,  so  that  we  are  again  faced  with  a  version  of  the 
smoothing  problem. 

Let  us  define  a  cross-validated  version  of  the  estimator 

A  A  • 

Bj^.  Define  B^  to  be  the  ridge  estimator  obtained  by  deleting 
the  i'th  observation.  The  squared  error  in  predicting  the  i'th 


observation, 


P  -i  7 

'i  ■  jll  ’ 


measures  the  appropriateness  of  X  as  a  smoothing  parameter. 


24 


Def ine 


=  n  I  iy’i  -  I 

n  1  ij  Aj 


and  choose  A=A  to  minimize  L, .  The  cross-validated  ridce 
n  A  ^ 

/V 

estimator  (due  to  Allen  [4])  is  6^  .  Our  simulations  (see 

n 

subsection  D  below),  and  those  of  others  (see,  for  example 

[30])  indicate  that  6^  can  be  an  extremely  good  estimator 

n  'j' 

for  B,  especially  when  X‘X  is  nearly  singular  or  a  is  large. 

In  almost  every  case  that  we  have  studied,  the  mean  squared 
error  of  the  cross -val idated  ridge  regression  estimator  is 
smaller  than  that  of  the  ordinary  least  -  squares  estimator. 
Often,  the  ridge  estimator  reduces  the  MSE  of  least  squares 
by  50  or  more  percent. 

There  is  a  closely  related  estimator,  due  to  Golub,  Heath, 
and  Kahba  [21],  called  the  "generalized  cross-validation"  (GC\’') 
ridge  regressor.  The  GCV  ridge  regressor  is  computed  by  first 
rotating  the  coordinate  system  and  then  deriving  the  ordinary 
cross-validation  (OCV)  estimator.  Simulations  demonstrate  that 
GCV  generally  performs  somewhat  better  than  OCV,  and  GCV  is 
more  easily  computed  and  has  proven  to  be  more  mathematically 
tractable.  Although  the  analytic  results  mentioned  below  are 
for  both  GCV  and  OCV,  we  will  not  formally  define  the  GC\ 
estimator  since  this  would  require  that  we  introduce  somewhat 


involved  notation. 
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Here,  again  loosely  stated,  is  what  we  know  about  the 
analytic  properties  of  cross-validated  ridge  estimators:^  If 

/V 

6^  is  the  GCV  or  OCV  ridge  regressor  then 

A. 

6^  -S||  0  a.s. 

'^n 

and  -6)  -  N(0,a^I). 

''n 

Observe  that  for  least  squares  the  distribution  of 

T  1  /  2  ■' 

CX^X)^/^(6g-6) 

2 

is  exactly  N(0,a  I).  Thus  the  cross -val idated  estimator 
asymptotically  assumes  all  of  the  distribution  properties  of 
the  ordinary  regression  estimator,  while  for  small  samples 
typically  improving  on  ordinary  regression  (as  measured  by 
mean  square  error) . 

Principal  component  analysis  is  another  modification  of 
least  squares  regression  that  is  used  (among  other  reasons)  for 
problems  with  ill-conditioned  design  matrices  or  small  samples. 

It  has  been  suggested  (see  [13],  for  a  discussion  in  a  slightly 
different  context)  that  cross-validation  may  be  an  effective 
means  for  choosing  the  number  of  components  retained  in  a 
principal  component  analysis.  Our  conclusions  with  regard  to 
cross-validated  principal  component  analysis  parallel  those  for 
the  cross-val idated  ridge  regressor;  simulations  demonstrate  its 
potential  superiority  over  ordinary  repression  for  finite  samples, 

A  more  precise  statement  of  this  and  other  results  for  finite 
dimensional  problems,  together  with  proofs,  will  appear  in  a 
forthcoming  manuscript  by  Geman  and  graduate  student,  Aytul  Erdal . 


26 


and  we  can  show  that  cross -val idated  principal  component  analysis 
is  asymptotically  equivalent  to  least  squares.  (For  the  latter, 
one  simply  shows  that,  almost  surely,  for  all  sample  sizes 
sufficiently  large  the  number  of  components  chosen  by  cross- 
validation  equals  the  real  dimension  of  the  regression  surface.) 
Details  will  be  in  the  manuscript  by  Erdal  and  Geman. 

C .  Studies  on  Specific  Method  of  Sieves  Estimators. 

In  addition  to  general  questions  of  consistency  for 
Method  of  Sieves  estimators,  we  have  made  a  study  of  individual 
properties  for  some  narticular  instances  of  the  method.  Two 
examples  are  discussed. 

1 .  Estimation  of  a  Poisson  Intensity  Function. 

One  problem  area  that  we  have  studied  from  several 
perspectives  is  the  estimation  of  the  intensity  function  of  a 
nonhomogeneous  two-dimensional  Poisson  proce.<«  based  -P"  the 
observation  of  random  projections  of  the  points  in  a  realization 
of  the  process.  Motivation  to  study  these  problems  comes  from 
their  direct  application  to  computed  tomography  --  the 
reconstruction  of  the  structure  (intensity  function)  of  a  body 
in  two  or  three  dimensions  on  the  basis  of  sets  of  projections 
of  the  body  to  one  or  two  dimensions. 

Our  work  to  date  has  developed  a  basic  understanding  of 
theoretical  properties  of  estimators  derived  from  the  method  of 
sieves  and  the  principle  of  maximum  likelihood  [  53 , 34  ,  ],  convergence 

properties  of  computational  algorithms  for  solving  the  very  large 
scale  optimization  problem  when  maximum  likelihood  is  used  for 
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image  reconstruction  [  1,34],  and  a  software  implementation  for 
a  microcomputer  of  two  alternative  algorithms  for  computing  the 
m. 1 . e .  [  1  ]  . 

The  mathematical  problem  is  related  to  single -photon 
emission  tomography  and  to  other  nuclear  medicine  applications 
in  section  V  below.  Closely  related  applications  and  mathematical 
formulations  are  given  in  the  recent  work  of  Shepp  and  Vardi  [45]. 
In  our  work,  however,  we  have  concentrated  on  a  connection 
between  the  continuous  spatial  background  Poisson  process  on 
one  hand  and  estimates  of  its  intensity  function  in  the  form  of 
digital  pictures,  on  the  other  hand.  The  relationship  between 
the  unknown  function  on  a  continuum  and  discretized  estimates 
of  it  fits  very  naturally  within  the  framework  of  the  method  of 
sieves . 

The  mathematical  and  inference  problems  take  the  following 
form.  Ke  describe  the  formulation  in  two  dimensions;  there  are 
no  essential  changes  in  the  general  formulation  or  in  the  nature 
of  the  results  in  three  dimensions,  but  the  notation  is  a  bit 
more  cumbersome. 

N 

Let  ^  ^  ^  i-1  ^  realization  ovcr 

the  time  interval  [0,T]  of  a  Poisson  process  with  intensitv 
function  p(x,y)  per  unit  time.  The  function  o  is  unknown  and 
its  estimation  is  our  goal.  The  compact  support  of  p  is 
known.  For  simplicity,  we  take  C  to  be  the  unit  square  centered 
at  (0.0).  The  realization  of  the  process  is  evolving  with  time  t. 
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In  fact,  is  Poisson  with  mean  EN^  ~  ]  j 

assume  p  is  in  (fi) .  ^ 

The  points  ^re  not  directly  observable.  Rather, 

we  observe  a  rav  with  an  orientation  0.  and  distance  S.  from 

-  1  -  1 

the  origin.  The  orientation  0.  is  uniformly  distributed  on 

N.p  ^ 

[0,2tt);  ®re  mutually  independent  and  are  independent 

of  The  distance  is  a  function  of  Xj.Tj  and  0^; 

an  oriented  line  in  direction  0^  is  passed  through  the 

point  (X^,Y^)  and  is  the  signed  distance  =  -X^  sin  0^  + 

Y.  cos  0.  from  the  origin  to  L. .  Our  observables  are  the  N_ 

N„  ^  ^ 

points  .  (In  the  single-photon  tomography  applications, 

one  actually  observes  grouped  data  in  place  of  the  points  (0^,sp, 

and  effects  from  attenuation  and  photon  scattering  must  be 

accounted  for.) 

The  problem  is  then  to  estimate  p  on  its  domain  SI  from 
observation  of  ^  (0^  .  S^) }  . 

No  a  priori  restrictions  are  imposed  on  p,  except 
p€Lj(f2).  Reasonable  nonparametric  estimators  can  be  expressed 
through  the  method  of  sieves.  One  aspect  of  these  formulations 
is  distinctly  different  from  problems  analysed  previously:  the 
observables,  even  when  conditioned  on  N.^.,  do  not  constitute  an 
iid  sample  with  intensity  (viz.  density)  p.  Instead,  the 
observables  represent  an  iid  sample  from  a  density  proportional 
to  Rp,  where  R  is  an  integral  operator  on  1^(0).  To  estimate  p, 
we  need  to  address  the  problems  of  "inversion"  of  R,  topological 
and  operator  theoretic  problems  such  as  identifying  the  appropriate 
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domain  for  R,  demonstrating  that  it  is  invertible  on  the  image 
of  that  domain,  and  understanding  continuity  properties  of  R 
and  R  ^ . 

Results  about  consistency  of  nonparametric  estimators  of 
p  and  about  characterization  of  algorithms  for  computing 
estimators  in  special  cases  are  reported  in  [34,35].  First  we 
confirm  that  ) }  ^_  constitutes  a  realization  of  a  Poisson 

process  on  a  region  Q;  the  intensity  function  of  the  process  is 


D(e,s)  =  ^  T  p(x,y)d£  =  ^  Rc(e,s), 

Le(s} 

wlicre  the  line  integral  is  taken  over  the  straight  line  path 
LgCs)  passing  through  (x,y)  having  positive  orientation  6,  and 
satisfying  s=-x  sin  6+y  cos  6;  I  denotes  arc  length  on  Lq(s). 

The  transform  domain  n  is  depicted  in  the  Figure  1.  It  is  the 

set  of  all  (0,s)  for  which  the  corresponding  line  intersects 
the  square  S2.  R  is  the  Radon  transform  [27]. 

In  the  general  approach  followed  in  [35],  the  main  steps 

A. 

are  (i)  to  specify  a  sieve  and  estimate  p  by  maximizing  the 
likelihood  of  {(0^,5^)}  over  the  sieve,  and  (ii)  to  map  the  m.l.e. 
p*  of  p  into  an  estimator  p*  of  p  =  R  p. 

The  first  step  is  reasonably  straightforward,  except  for 

A 

purely  technical  problems  associated  with  the  domain  and  the 

A 

unusual  form  relative  to  Q  of  natural  sieves.  The  likelihood 
function  assumes  a  convenient  form  for  analysis: 


JO  > 


pixel  , 


support  of 

I 

j 

I 

1  domain  Q 

[-.5, .5] 

□  : 

1 

1 

pixel  B,  i 

^  \ 

support  of  1 

transform  domain 


Figure  1 
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where  y  ~ 


p  (0  ,  s)d0ds . 


The  most  "natural"  sieve,  thinking  in  terms  of  potential 

applications  to  emission  tomography  and  construction  of  digital 

pictures,  is  based  on  a  partition  of  Q  into  X  congruent  s  ,uare 

pixels.  Let  denote  the  sets  in  this  partition  and  let  be 

the  indicator  function  of  b\  j  =  l  to  X  .  Then 

1 


=  {p  €  Lj (n) 


r  .  X 

j=l  ^  ^ 


The  image  of  in  determines  the  sieve 


=  {p  e  :  P 


I 

j=l  J  ^ 


where  =  RcJ^.  Each  of  the  basis  functions  !p^(d,s)  for  S, 

11  1 
A 

has  a  support  inside  0.  in  the  shape  of  a  narrow  sinusoidal  band 
(see  Figure  1);  the  support  of  consists  of  the  oriented  lines 
(0,s)  that  intersect  the  pixel  Bj .  Within  its  support,  for  any 
fixed  0Q,  !;;j(0Q,s)  is  a  piecewise  linear  function  of  s. 

Now  p  is  estimated  by  fixing  X  =  X,p  and  finding  a  maximiring 
function  p.p  in  of  the  likelihood  J/’ip)  .  Except  for  the 

unusual  structure  of  the  basis  functions  ,  this  is  a  straight¬ 
forward  implementation  of  the  method  of  sieves.  We  can  show, 

A 

using  the  "small-ball  technique"  on  Lj^  (11)  that  if 
(i)  X.J.  =  o(T^^^  ^)  for  some  e  >  0,  and 


:h,  ff 


p(0,s)S,n  p(0,s)d0ds  <  <»  ,  then 


A  ^ 

(peeks')  -  p^(0,s)|d0ds  =  0  with  probability  one. 
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Condition  (i)  is,  we  feel,  not  sharp,  and  we  are  continuing 
to  investigate  this. 

It  is  a  delicate  matter  to  translate  the  consistency  of 

-1^**  .1 

P.J.  into  a  consistency  theorem  for  R  Pj  =  p^.  Indeed,  R  is 
not  a  continuous  mapping  from  Lj(fi)  into  Lj(fi).  h’e  can,  however, 
regard  R  as  an  operator  on  a  domain  other  than  and  obtain 

useful  continuity  properties.  Two  approaches  are  possible, 
based  on  results  about  the  Radon  transform  developed  by 
Ludwig  [31]  (i)  we  can  restrict  R  to  a  domain  smaller  than 

Lj ,  impose  strong  regularity  conditions  and  a  stronger  topologv 
on  that  domain  and  obtain  a  useful  continuity  property  for  R, 
or  (ii)  we  can  extend  R  to  a  domain  larger  than  Lj ,  relax 
conditions  on  elements  of  the  domain,  prescribe  a  weaker  topology 
for  the  enlarged  domain,  and  we  obtain  a  bicontinuous  extended 
operator  R.  The  first  approach  requires  us  to  impose  unrealistic 
smoothness  restrictions  on  the  intensity  function  p  being 
estimated,  so  the  usefulness  of  theorems  derived  by  this  approach 
is  limited.  The  second  approach  adheres  to  the  spirit  of  truly 
nonparametric  inference  by  not  imposing  additional  restrictions 
on  p.  The  price  paid  by  the  theory  that  consistency  results 

are  expressed  in  a  weaker  topology,  that  is,  we  will  conclude 

* 

p.p  -*■  D  in  a  weak  topology  as  T  -*•  <»  (but  still  in  a  strong  sense 
probabilistically,  i.e.  almost  surely).  The  approach  bears  a 
strong  resemblance  to  methods  used  to  prove  existence  of  solutions 
of  PDFs  by  first  confirming  existence  of  a  solution  in  a  weak 
sense  and  then  analyzing  the  regularity  properties  of  the  weak 
solution . 
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For  the  oresent  problem,  starting  with  Lj(fi)  as  our 
space  for  p,  it  is  natural  to  use  an  extension  of  R  to  the 
space  of  Schwartzian  distributions  with  compact  support. 

Any  element  of  is  such  a  generalized  function  since 

f  p  (x  ,y)  (x  ,y)  dxdy  is  a  continuous  linear  -functional  on  the 
^  -  2 

space  =  C  OR  )  of  test-functions  <J).  Details  of  the  extension 
of  R  are  given  in  [27]  and  [31].  The  extended  R  is  a  bicontinuous 
biiection  from  to  a  comnletely  characterized  subspace 
Sf'  n  N"*"  of  . 

^  * 

Now  we  can  translate  convergence  of  Pj  into  convergence 

of  o.p.  The  strong  norm  convergence  of  Oj  to  p  imnlies 

convergence  in  the  weaker  topology  of  the  range  n  ,  which 

- 1  * 

in  turn,  from  continuity  of  R  ,  implies  convergence  of  p.p  to 


p.  Specifically,  for  any  <p  in  C  OR  )  ,  lim  d-(x  ,v)p  fx  ,v)dxd^•  = 

T^»  * 


P  (x  ,y)  1^1  (x  ,  y)  dxdy  with  probability  one. 


This  general  analytical  strategy  is  applied  in  [55]  to 
determine  consistency  of  estimators  based  on  alternative  sieves. 

Other  results  are  reported  in  [34]  where  we  consider 
intensity  functions  p  which  are  niecewise  constant  on  subsets 
of  fl  and  we  wish  to  estimate  the  unknown  sets.  He  relate  tlic 
set  estimation  problems  to  more  familiar  nroblems  of  estimating 
a  univariate  unimodal  density  function.  That  connection  allows 
us  to  adapt  consistency  and  asymptotic  distribution  results 
[38.  57].  to  characterize  nonparametric  m.l.e.s  in  terms  of 
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convex  envelopes  of  counting  functions  of  the  projected  point 
process,  and  to  obtain  direct  computational  algorithms  for  the 
m.l.e.  along  with  results  on  the  complexity  of  the  algorithms. 

In  directly  related  work,  graduate  student  Nicholas  Accomando 
has  designed  and  implemented  algorithms  on  a  microcomputer  for 
computation  of  estimators  of  p  based  on  maximum  likelihood  and 
the  method  of  sieves.  The  algorithms  are  designed  with  the 
single-photon  emission  tomography  problems  in  mind  and  they 
allow  for  effects  of  photon  attenuation.  Thus  they  implement 
inversion  of  the  more  general  attenuated  Radon  transform  R 

V 

(see  section  V)  of  which  R  is  a  special  case.  The  software 
developed  for  the  Data  General  MP-200  microcomputer  uses  a 
combination  of  PASCAL  and  assembler  code.  The  assembler 
language  modules  are  documented  in  PASCAL  in  order  to  facilitate 
implementation  on  other  computers.  The  programs  implement  both 
gradient  methods  of  calculating  the  m.l.e.  and  the  EM  algorithm 
analyzed  and  used  by  Shepp  and  Vardi  [45].  The  software  design 
problem  is  a  hard  one  since  it  requires  the  balancing  of  the 
constraints  of  a  small  computer  and  of  a  very  big  optimization 
problem.  Accomando's  system  can  handle  the  method-of - s ieves 
estimator  with  A  =  8100  free  parameters,  corresponding  to  a 
90x90  digital  picture;  the  MP-200  has  32k  16-bit  words  of  high 
speed  memory  to  which  Accomando  has  added  a  board  with  64k  16-bit 
words.  The  programs  will  reconstruct  a  60x60  phantom  in  about 
fifteen  or  twenty  minutes,  depending  on  the  algorithm  (EM  or 
con jugate- gradient )  and  on  the  number  of  iterations  needed. 
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Accomando's  implementation  is  convincing  proof  of  the  feasibility 
of  maximum- 1 ikel ihood  and  the  method-of-sieves  for  solving 
large  scale  problems  with  readily  available  computational 
resources . 


2 .  Convergence  Rates  and  Asymptotic  Distribution 
for  some  Canonical  Sieves. 

The  general  question  of  convergence  rates  and 
asymptotic  distribution  for  sieve  estimators  is  extremely 
difficult.  Professor  H.T.  Nguyen,  in  personal  correspondence, 
has  suggested  that  estimators  based  on  sieves  which  consist  of 
increasing  subspaces  of  a  Hilbert  space  would  be  particularly 
amenable  to  thorough  mathematical  analysis.  A  recent  paper  by 
Nguyen  and  Pham  [37]  puts  this  idea  to  good  use.  They  utilize 
a  sieve  of  this  type  to  estimate  the  drift  frunction  of  a 
repeatedly  observed  non- stationary  diffusion,  and  they  are  able 
to  partially  specify  rates  of  convergence  and  asymptotic 
distribution . 

Here  is  a  more  elementary  example  of  Nguyen's  idea,  which 
we  have  explored  in  some  detail.  Recall  the  nonparametr ic 
regression  problem  discussed  briefly  in  section  II.  Let  us 
here  take  x,  the  "independent"  variable,  to  be  deterministic. 

We  then  think  of  the  distribution  of  Y  as  being  an  unknown 
function  of  x,  ( ' )  •  For  this  example,  we  will  assume 
xe  [0,1].  The  problem  is  then  to  estimate 


OpCx)  =  E^[Y] 


00 

yF^(dy)  xe|0,l] 


•  oo 
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from  independent  observations  ^i  ~ 

Xj^,...x^  is  a  deterministic ,  so-called  design,  sequence.  In 
other  words,  for  each  i=l,2,...n  we  make  an  observation,  y^, 
from  the  distribution  F  ,  and  from  these  observations  we  wish 

X .  ’ 

1 

to  estimate  the  mean  of  Y  as  a  function  of  x.  For  a  specific 
example,  let  us  assume  that  the  design  sequence,  for  fixed  n, 
is  equally  spaced  on  the  interval  [0,1]; 

X.  1  =  1,2, ...n. 

j  n  J  •  t 

Of  course,  an  unconstrained  minimization  of  the  sum  of 
square  errors 

(yi-ct(x.))“ 

will  not  produce  a  useful  estimator.  Using  the  Hilbert  space 
L2  ( [0 , 1 ] ,dx) ,  a  sieve  of  the  type  suggested  by  Nguyen  is  the 
"Fourier  sieve" 

Sjjj  =  {a(x):a(x)  =  J  aj.e‘^^^^}; 

k=-m 

it  is  particularly  tractable  and  makes  for  a  good  illustration 

of  his  idea.  The  sieve  size  is  governed  by  the  parameter  m, 

which  will  be  allowed  to  increase  to  infinity  with  n.  If  we 

restrict  m  so  that  m  <  n  for  all  n,  then  a  is  uniquely  defined 
n  n  —  n  ’ 

by  requiring  that  it 


r  ' 

minimize  )  (v.-a(x.))  subiect  to  aCS 

.  ^  ''  1  1  •  II 
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Because  of  the  subspace  character  of  the  sieve,  it  is  quite  easy 

to  show  that  for  any  sequence  m  such  that  m  /n  0  and  iti  <  n, 

'  ^  n  n  n  -  ’ 

'a  0  1  1 

E  |c.„(x)-a„(x)|2dx  -  0(L  .  ^ 

0  n  /n 

as  n  In  particular,  if  -■  /n,  then 

1 

E  la  (x)-a„(x)|^dx  =  0(— ) 

J  n  u 

0  " 

as  n  -*•  (All  under  some  very  mild  assumptions  -  see  [18] 

for  details . ) 

A 

What  does  the  least  squares  estimator,  a^ ,  look  like? 

A  simple  calculation  gives  the  explicit  form: 


1  =  1  n 


where  D  is  the  Dirichlet  kernel 
m 


Dm(x)  = 


_-2Trikx  _  sin  TT(2m-t-l)x 


sin  TTX 


Here,  then,  the  least  squares  (sieve)  estimator  turns  out  to 
be  a  kernel  estimator.  Kernel  estimators  for  nonparametric 
regression  have  been  widely  studied,  although  from  a  somewhat 
different  point  of  view.  See  [  5]  ,  [  11],  [1^  ,  [  4 3]  and  [48]  for 
some  recent  examples.  It  is  not  too  difficult  to  now  exploit 
this  simple  form  for  a^,  and  say  u  good  deal  more  about  its 
behavior.  Let 


V(x)  =  (y-ap(x))^F^(dy) , 
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the  variance  of  Y  at  x.  Then  if  OqCO)  =  OqCI),  and  if 

g  11 

such  that  =  0(n  )  for  some  j  ^  2’  process 

t 

Pj^Ct;)  H  /n  j  (a^(x) -OqCx)  )dx 
0 

converges  weakly  on  [0,1]  to  the  diffusion,  pft),  defined  by 

dp(t)  =  ATtT  dW^,  p(0)  =  0 

where  is  standard  Brownian  motion.  (Again,  see  [18] 
for  details.) 

The  conditon  oiqCO)  =  ^gCl)  is  awkward,  but  it  cannot  be 
removed.  It  is  a  consequence  of  the  sieve,  ,  which  admits 
only  functions  which  are  continuous  on  the  unit  torus.  Another 
sieve  of  the  subspace  type,  more  natural  in  the  absence  of  the 
assumption  ag(0)  =  OgCl),  is 

I 

S„  =  {oi(x);a(x)  =  I  a,  cos  [k  arc  cos(2x-l)]} 

^  k=-m 

i.e.  replace  the  trigonometric  polynomials  by  the  Chebyshev 
polynomials.  Here  we  would  want  to  choose  a  design  sequence 
which  preserves  the  orthogonality  of  the  basis  sequence; 

Xj  =  2  7  cos  [2j-l  )Tr/2n]  ,  j=l,2,...n. 

» 

We  presume  that  the  above  results  have  their  analogues  for  5^ 
as  well. 

Still  a  good  deal  more  can  be  said  about  the  estimator 
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With  suitable  restrictions  on  the  growth  of  m 
pointwise  convergence  (E |  Cx) -Qq (x)  |  -*■  0  for 

pointwise  asymptotic  normality;  and  a  relation 
smoothness  of  Oq  and  the  rate  at  which 

E  I (x) -Oq (x) I  dx  converges  to  zero. 

J 
0 

D.  Computer  Experiments 

One  of  the  most  powerful  strategies  for  our  past 
research  on  the  method  of  sieves  has  been  to  systematically 
use  computer  experimentation.  The  experiments  often  involve 
simulation  of  the  processes  and  inference  methods  which  we  are 
analyzing.  And  frequently  the  experiments  do  not  involve 
simulation  per  se,  but  use  the  machine  to  solve  numerically 
the  analytical  hurdles  that  come  up  in  a  purely  theoretical 
analysis.  Both  kinds  of  experiments  have  been  invaluable 
complements  to  our  analysis.  They  have  provided  direction  and 
reinforced  confidence  in  the  day-to-day  evolution  of  the 
theoretical  work;  they  have  made  it  possible  for  us  to  understand 
and  to  demonstrate  connections  between  asymptotic  theorems  on 
one  hand  and  small-sample  properties  of  method -of  -  si  eves  estimators 
on  the  other  hand;  and  they  have  frequently  suggested  hypotheses 
to  us,  providing  new  ideas  to  be  confirmed  later  by  rigorous 
analysis . 

Most  of  our  experiments  have  been  carried  out  interactively, 
usinp  a  substantial  library  of  APL  programs  developed  by 
Grenander  and  McClure  [25]  as  a  tool  for  experimental  mathematics. 


,  we  can  establish: 
each  x€ (0 , 1 ) ) ; 
between  the 


40 


The  library  has  been  assembled  systematically  over  the  nast 
four  years.  It  is  well  documented  and  is  now  being  shared 
with  mathematicians  elsewhere.  The  examples  of  experiments 
described  below  have  been  facilitated  by  the  availability  of 
the  programs  for  interactive  graphics,  simulation,  linear 
algebra  and  matrix  spectrum  analysis,  computational  geometry, 
quadrature,  and  discrete  Fourier  analysis. 

Example  1 .  The  theoretical  understanding  of  cross-validated 
ridge  regression  is  essentially  complete,  at  least  the 
asymptotic  properties  are  clear,  since  the  cross-val idated 

estimator  6^  has  been  shown  to  have  the  same  asymptotic 

''n 

distribution  as  the  least  -  squares  estimator  of  ?.  Extensive 

simulations  have  been  useful  for  demonstrating  fi)  the  small - 

sample  behavior  of  6^  ,  (ii)  the  relationship  between  ordinary 

"n 

and  generalised  cross - va 1 i dat ion ,  (iiil  the  sensitivity  of  the 

A  j 

performance  of  B,  to  i 1 1 -cond it ioning  of  X  X,  and  (ivl  the 
dependence  for  small  -  samples  of  the  mean-squared-error  of 
on  the  error  variance  o". 

Two  examples  of  the  simulation  results  are  depicted  in 
Figures  2  and  3.  The  ordinate  on  each  graph  is  a  so-called 
Relative  Error  for  a  cross-val idated  (ordinary  or  gcneralicedi 

A  J 

ridge  regression  estimator  6^  of  6  in  the  model  Y  =  6  X+c, 

where  X  is  nxp  with  n=30  and  p=5,  the  variance  of  c  is  o"I 

(a=.01  is  Figure  2),  and  a  is  an  index  of  the  i 1 1  - condi t ion ing 

T 

of  the  5x5  matrix  X  X  (a=0.8  in  Figure  3);  the  eigenvalues  of 
T  2 

X  X  are,  on  the  average,  1-a  with  multiplicity  four  and  l+4v 


Relative  Errors  for  Cross- Valdiated  Estimators 
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Figure  2 
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T 

with  multiplicity  one,  so  X  X  collapses  to  a  rank  one  matrix 
as  a  goes  to  1. 

The  precise  descriptions  of  the  two  mean  -  squared -error s  , 

A 

prediction  and  Euclidean,  used  to  assess  3^  are  not  particularly 

important  for  describing  the  general  conclusions  that  can  be 

drawn  from  these  experiments.  It  suffices  to  know  that  the 

Relative  Errors  are  ratios  of  a  mean- squared-error  for  the 

cross-validated  estimator  to  that  mean-squared-error  for  the 

least  squares  estimator  of  6.  The  asymptotic  efficiency  of  the 

least  -  squares  estimate  when  e  is  Gaussian  together  with  the 

asymptotic  distribution  theorem  for  6^  (see  p.25)  combine  to  show 

that  for  any  fixed  values  of  the  parameters  a  and  c,  the 

Relative  Error  goes  to  1  as  n  goes  to  The  difference  from  1 

for  these  small  sample  (n=30)  experiments  is  striking.  Both 

ordinary  and  generalized  CV  estimators  perform  significantly 

better  than  the  asymptotically  efficient  least  -  squares  estimator. 

The  results  also  exhibit  the  greater  advantage  of  ridge  regression 

T 

with  cross-validation  as  X  X  becomes  more  singular. 

These  simulations  were  carried  out  fairly  early  during 
Erdal  and  Geman ' s  study  of  properties  of  cross -val idated  ridge 
regressors  and  the  experiments  had  a  direct  influence  on  the 
direction  of  the  theoretical  developments.  The  pictures  provided 
convincing  evidence  that  the  ridge  regressors  are  in  general  as 
good  as  (and  usually  better  than)  ordinary  least  squares.  In 
addition,  the  relative  small  differences  in  simulation  results 
for  the  ordinary  compared  to  the  generalized  cross-validated 
estimators  stimulated  the  successful  and  more  intricate  analysis 
of  the  ordinary  cross -val idated  estimator. 
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Example  2.  Experiments  have  been  carried  out  in  connection 
with  method-of- sieves  estimates  of  two-dimensional  Poisson 
intensity  functions  and  R-spline  estimates  of  univariate 
density  functions.  The  two  estimation  problems  are  directly 
related  to  each  other.  The  experiment  shave  been  used  to 

(i)  determine  the  small  sample  behavior  of  estimators  whose 
consistency  is  reasonably  straiphtforward  to  prove  and 

(ii)  explore  questions  of  consistency  for  cross -val idated 
estimators,  whose  consistency  properties  are  not  yet  known. 

Figure  4  depicts  a  realization  of  a  two-dimensional  ^oisson 
process  on  the  square  ^2.  For  this  particular  experiment,  the 
nonuniform  intensity  function  assumes  two  values,  corresponding 
to  a  "hot"  region  K  c  n  surrounded  by  lower-level  background 
Poisson  events.  The  unknown  region  K  is  assumed  to  be  convex, 
but  otherwise  arbitrary  and  the  goal  is  to  reconstruct  K  from 
a  finite  set  of  independent  projections  of  the  Poisson  Points. 

The  points  plotted  at  the  bottom  of  the  figure  are 
projections  to  the  x-coordinates  alone  of  the  Poisson  points 
in  n.  The  projected  points  are  a  realization  of  a  one-dimensional 
Poisson  process  with  a  nonhomogeneous  intensity  that  is  unimodal 
and  nonuniform  in  the  interval  "shadow"  of  K.  An  estimation 
procedure  analyzed  in  [34]  is  based  on  using  the  univariate 
process  to  infer  the  location  of  K's  shadow.  Tn  the  particular 
case  of  polygonal  K,  the  univariate  intensity  function  is  a 
spline  of  order  2  (piecewise  linear)  and  spline  estimators  are 
natural . 
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The  function  depicted  in  Figure  4  is  an  estimate  for 

this  particular  realization  of  the  process  of  the  univariate 

intensity.  It  is  a  linear  combination  of  R-splines  with 

equally-spaced  knots.  Once  the  number  of  knots  is  specified, 

the  coefficients  of  the  basis  functions  that  determine  the 

estimate  are  found  by  maximizing  the  likelihood  function. 

The  general  recipe  for  establishing  consistency  [l9]  applies 

here  to  assure  that  the  B-spline  estimators  are  consistent  if 

1  /4  - 

the  number  of  knots  goes  to  ®  suitably  slowly  (0(n  ))  as 

sample  size  n  increases.  But  for  a  fixed-size  samnle,  the 
problem  of  specifying  the  number  of  spline  knots  is  another 
instance  of  the  recurrent  smoothing  problem  in  nonparametr ic 
estimation . 

The  spline  estimate  in  Figure  4  lets  the  data  prescribe 
the  appropriate  number  of  knots  by  half-sample  cross-validation 
maximizing  a  cross-validation  likelihood  with  respect  to  the 
number  of  knots;  one-half  the  sample  is  deleted  for  each  of 
two  factors  in  the  cross-validation  likelihood,  instead  of 
deleting  one  sample  unit  for  each  of  n  factors  as  in  eqn.  III.P 

The  experiments  that  we  have  done  suggest  that  the  half¬ 
sample  cross -val idated  B-spline  estimators  are  effective  for 
problems  such  as  this  one  where  a  target  density  (or  intensity! 
has  compact  support.  The  theoretical  confirmation  of  any 
consistency  properties  remains  a  tempting  and  elusive  problem. 
The  gap  between  this  problem  and  known  results  occurs  because 
the  estimators  here  are  not  prescribed  explicitly,  but  only 
implicitly  through  the  principle  of  maximum  likelihood. 
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The  experimental  results  present  compelling  evidence  that 
consistency  results  analogous  to  known  ones  will  hold  for  much 
more  general  types  of  estimators  than  ones  that  admit  explicit 
representation.  Another  instance  is  depicted  in  Figure  5. 

Here  a  cubic  spline  with  m  equally  spaced  knots  is  used  to 
estimate  a  density  function  for  a  data  set  used  by  Boneva , 

Kendall  and  Stefanov  [  7 ]  to  illustrate  alternative  spline 
estimators.  The  B-spline  coefficients  are  determined  by 
maximum  likelihood  and  the  number  of  knots  is  found  by  maximizing 
the  half-sample  cross-validation  likelihood.  Our  exnerience 
with  the  experiments  encourages  continuing  analysis  of  the 
theoretical  problems  as  described  in  section  IV, A. 2  below. 

Fxneriments  that  are  similar  in  spirit  to  the  two  described 
above  have  been  instrumental  in  focusing  our  attention  on  imiiortant 
aspects  of  the  theoretical  analysis  of  (i)  the  consistency 
problem  for  cross- val idated  histograms  and  kernel  estimators 
[10]  and  (ii)  complex  operator  equations  that  characterize  the 
bias  of  certain  estimators  of  regression  functions  of  two 
variables  (see  IV. B).  For  the  cross-validation  problem,  the 
experiments  reported  by  Schuster  and  Gregory  [42]  pointed  to  the 
pivotal  role  of  spacings  in  deciding  the  consistency  oF  density 
estimators.  Our  own  simulations  are  important  in  the  formulation 
of  simple  sufficient  conditions  for  consistency.  For  the 
surface  regression  problems,  the  striking  simplicity  of  the 
numerical  solutions  of  very  large  systems  of  linear  equations 
pointed  the  direction  for  proving  general  properties  of  those 
solutions . 


Figure  5 


STEPHENS 
N  =  76 

K  =  4  ;  L  =  12 
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ly.  Projects  in  Progress 

As  of  June  1,  1983  a  number  of  projects  were  in  progress,  at 
varying  stages  of  completion,  that  relate  directly  to  our  research 
program  on  nonparametric  estimation  by  the  method  of  sieves.  These 
projects  involve  collaboration  of  the  principal  investigators  with 
other  faculty  members  (Ulf  Grenander  at  Brown,  Lewis  Pakula  at  the 
University  of  Rhode  Island,  and  Donald  Geman  at  the  University  of 
Massachusetts) ,  with  postdoctoral  associates  (Barry  Davis  and  Aytul 
Erdal) ,  and  with  Ph.D.  candidates  (Shoulamit  Shwartz,  Nicholas 
Accomando,  Joyce  Anderson,  Edmond  Nadler  and  Brock  Osborn) .  We 
shall  briefly  mention  the  status  of  these  projects,  since  the  work 
to  date  has  been  supported  by  contract  DAAG29-80-K-0006 , 

1.  Aytul  Erdal  and  Stuart  Geman  have  prepared  drafts  of  manuscripts 
that  report  new  consistency  and  asymptotic  distribution  results  for 
cross-validated  estimators  in  ridge  regression  and  principal 
components  analysis.  The  manuscripts  are  based  on  results  reported 
in  Erdal 's  1983  Ph.D.  dissertation.  The  papers  will  be  submitted 
for  publication. 

2.  Stuart  Geman,  Donald  McClure,  and  Barry  Davis  continue  active 
collaboration  with  members  of  the  Division  of  Cardiology  at  Rhode 
Island  Hospital  on  several  digital  image  processing  problems.  These 
include  image  registration  methods  for  noninvasive  digital 
subtraction  imaging  of  coronary  arteries  and  image  enhancement  and 
restoration  methods  for  nuclear  medicine  scans  of  the  myocardium 
that  have  been  degraded  by  photon  attenuation.  We  are  experimenting 
with  the  use  of  the  method  of  sieves  to  estimate  nonlinear 


50 


transformations  that  map  one  scene  into  another;  we  believe  that 
this  will  prove  to  be  an  effective  and  versatile  method  for 
registration  of  scenes. 

Joyce  Anderson  is  working  with  McClure  on  use  of  maximum 
likelihood  and  the  method  of  sieves  to  enhance  the  attenuated 
nuclear  medicine  scans.  These  applications  have  motivated  the  study 
of  new  theoretical  problems  in  nonparametric  estimation  based  on 
censored  data  and  incomplete  observables. 

3.  Shoulamit  Shwartz  has  completed  the  research  for  her  Ph.D. 
thesis  on  the  design  of  triangulation  sieves  for  nonparametric 
multiple  regression.  The  theoretical  questions  are  motivated  by 
surface  reconstruction  problems  in  remote  sensing.  The  thesis  is  in 
draft  form  and  the  results  will  be  included  in  manuscripts  being 
prepared  for  publication  with  Donald  McClure. 

4.  Stuart  Geman  and  Donald  McClure  are  working  together  with  Ulf 
Grenander,  Lewis  Pakula  and  Donald  Geman  on  a  project  to  develop 
mathematical  models  for  complex  systems.  The  phenomena  being 
modeled  include  real  pictures  of  outdoor  scenes  and  of  highly 
structured  scenes  and  systems  as  disparate  from  digital  images  as 
neural  systems  for  complex  decision  processes  such  as  medical 
diagnosis.  To  date,  nine  internal  working  papers  have  been  prepared 
by  the  group.  These  include  a  master  plan  for  the  project,  an 
outline  of  specific  open  problems,  and  preliminary  results  on 
particular  complex  systems  under  investigation. 

5.  Stuart  Geman,  Donald  McClure  and  David  Cooper  recently  organized 
the  program  for  an  ARO  sponsored  Workshop  on  Unsupervised  Image 
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Analysis.  The  workshop  was  held  at  Brown  on  14-16  April  1983. 
McClure  and  Cooper  are  currently  working  with  Dr.  Robert  Launer  on 
the  editing  of  the  workshop  proceedings  which  will  be  published  by 
Academic  Press.  Includsjd  in  the  proceedings  will  be  papers  by  Geman 
on  "A  Markov  random  field  model  for  image  segmentation"  and  by 
McClure  on  "Image  processing  algorithms  based  on  methods  of 
nonparametric  inference". 

6.  Stuart  Geman  presented  an  invited  talk  at  the  June  meeting  of 
the  Institute  of  Mathematical  Statistics  in  Areata,  CA  on  the  work 
with  Grenander,  D.  Geman,  and  McClure  on  hierarchical  Markovian 
models  for  discrete  pictures  and  restoration  algorithms  based  on 
these  models. 

7.  Stuart  Geman  will  present  an  invited  talk  on  "A  parallel 
realization  for  maximum  entropy  distributions  with  applications  to 
problems  in  inference  and  optimization"  at  the  Third  Workshop  on 
Maximum  Entropy  and  Bayesian  Methods  in  Applied  Statistics  on  1-4 
August  1983  at  the  University  of  Wyoming. 

8.  Donald  McClure  will  present  an  invited  talk  on  mathematical 
experiments  on  the  computer  at  the  Institute  of  Mathematical 
Statistics  Special  Topics  Meeting  on  Statistics  and  Computing  on 
24-26  October  1983  at  Pennsylvania  State  University, 
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32.  Publications  Technical  Reports 

The  list  in  this  section  includes  internal  and  interim  technical 
reports  and  published  papers  that  have  been  prepared  as  part  of  the 
project  on  nonparametric  estimation  by  the  method  of  sieves. 
Technical  reports  in  the  series  "Reports  in  Pattern  Analysis"  can  be 
obtained  from  the  Division  of  Applied  Mathematics  at  Brown 
University. 


1.  S.  Geman  &  C-R.  Hwang,  Nonparametric  maximum  likelihood 
estimation  by  the  method  of  sieves.  Reports  in  Pattern  Analysis  No. 
80,  1979.  Ann.  Statist,  vol.  10,  1982,  401-414. 

2.  S.  Geman  &  C-R.  Hwang,  A  chaos  hypothesis  for  some  large  systems 
of  random  equations.  Reports  in  Pattern  Analysis  No.  82,  1979.  i. 
Wahrscheinlichkeitstheorie  verw.  fiebietCr  vol.  60,  1982,  291-314. 

3.  S.  Geman,  An  application  of  the  method  of  sieves:  functional 

estimator  for  the  drift  of  a  diffusion.  Reports  in  Pattern  Analysis 
NO.  92,  1980.  Colloauia  Mathematica  Societas  vol.  32, 

1980,  231-252. 

4.  C.  Plumeri,  Conditioning  by  inclusion  when  connection  type  is 
LINEAR,  Reports  in  Pattern  Analysis  No.  94,  1980.  (Ph.D.  work 
supervised  by  D.  McClure) 

5.  S.  Geman,  The  law  of  large  numbers  in  neural  modelling.  Reports 
in  Pattern  Analysis  No.  95,  1980.  si am- AMS  Proceedings. 

1981,  91-105. 


vol.  13, 
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6.  C.  Plumeri,  On  convergence  of  sums  of  Markov  random  variables, 
Reports  in  Pattern  Analysis  No.  96,  1980.  (Ph.D.  work  supervised  by 
D.  McClure) 

7.  S.  Geman,  Almost  sure  stable  oscillations  in  a  large  system  of 
randomly  coupled  equations.  Reports  in  Pattern  Analysis  No.  97, 

1980. 

8.  S.  Geman,  Sieves  for  nonparametric  estimation  of  densities  and 
regressions.  Reports  in  Pattern  Analysis  No.  99,  1981. 

9.  C.  Plumeri,  Asymptotic  probability  measure  on  regular 
structures.  Reports  in  Pattern  Analysis  No.  100,  1981.  (Ph.D. 
dissertation  supervised  by  D.  McClure) 

10.  J.  E.  Anderson,  Experiments  with  the  method  of  sieves  for 
density  estimation.  Reports  in  Pattern  Analysis  No.  105,  1981. 
(Honors  thesis  supervised  by  S,  Geman  &  D.  McClure) 

11.  y-S.  Chow,  S.  Geman  &  L-D.  Wu,  Consistent  cross-validated 
density  estimation.  Reports  in  Pattern  Analysis  No.  110,  1981.  Ann. 
Statist,  vol.  11,  1983,  25-38. 

12.  S.  Geman  &  D.  E.  McClure,  Characterization  of  a  maximum 
likelihood  nonparametric  density  estimator  of  kernel  type.  Reports 
in  Pattern  Analysis  No.  114,  1982.  Proceedings  the  NASA  Workshop 
on  >>insity  Estimation  and  Function  Smoothing.  L.  F.  Guseman,  Jr. 
(editor),  Texas  A&M  University  (1982),  38-47. 

13.  D.  E.  McClure,  Estimation  of  planar  sets  from  Poisson 
projections.  Reports  in  Pattern  Analysis  No.  115,  1982.  Proceedings 
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M  ths.  NASA  Workshop  on  Density  Estimation  aM  Function  Smoothing r 
L.  F.  Guseman,  Jr.  (editor),  Texas  A&M  University  (1982),  38-47, 

14.  W,  B.  Levy  &  S,  Geman,  Limit  behavior  of  experimentally  derived 
synaptic  modification  rules.  Reports  in  Pattern  Analysis  No.  121, 
1982. 

15.  S.  Geman,  Method  of  sieves.  Reports  in  Pattern  Analysis  No.  125, 
1982.  Encyclopedia  Statistical  Sciences,  vol.  5,  1983. 

16.  N.  Accomando,  The  implementation  of  maximum  likelihood 
reconstruction  algorithms  for  single  photon  emission  tomography, 
Reports  in  Pattern  Analysis  No.  126,  1982.  (Ph.D.  work  supervised  by 
D.  McClure) 

17.  A,  Erdal,  Cross-validated  ridge  regression.  Reports  in  Pattern 
Analysis  No.  127,  1982.  (Ph.D.  work  supervised  by  S.  Geman) 

18.  A.  Erdal,  The  method  of  cross-validation  for  principal  component 
analysis.  Reports  in  Pattern  Analysis  No.  128,  1983.  (Ph.D.  work 
supervised  by  S.  Geman) 

19.  B.  R.  Davis  &  S.  Geman,  The  application  of  neurobiological  and 
statistical  concepts  to  machine  intelligence.  Reports  in  Pattern 
Analysis  No.  129,  1983. 

20.  E.  Nadler,  Least  square  approximation  of  functions  on  an 
equilateral  triangle  by  linear  functions.  Reports  in  Pattern 
Analysis  No,  131,  1983.  (Ph.D,  work  supervised  by  D.  McClure) 


21.  S.  Shwartz,  Optimal  design  of  triangulation  sieves  for 
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nonparametric  multiple  regression  and  surface  restoration,  in 
preparation.  (Ph.D.  dissertation  supervised  by  D.  McClure) 

22.  S.  Geman,  A  Markov  random  field  model  for  image  segmentation,  in 
preparation,  to  appear  in  Automated  Unitaje  Analysis;  Theory 
Experiments.  D.  Cooper,  R.  Launer  &  D.  McClure  (editors).  Academic 
Press. 

23.  D.  E.  McClure,  Image  reconstruction  algorithms  based  on  methods 
of  nonparametric  inference,  in  preparation,  to  appear  in  Automated 
Image  Analysis;  Theory  and  Experiments.  D.  Cooper,  R.  Launer  &  D. 
McClure  (editors).  Academic  Press. 
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11.  Peisonnel 

The  following  people  have  made  substantive  contributions  to  the 
research  project  on  nonparametric  estimation  supported  by  contract 
DAAG29-80-K-0006. 


Stuart  Geman,  co-principal  investigator. 
Donald  E.  McClure,  co-principal  investigator. 


Chii-Ruey  Hwang,  collabjrator  with  S.  Geman  and  Visiting 
Assistant  Professor  at  Brown  University,  1981. 


Nicholas  Accomando,  Ph.D.  candidate,  1980-83,  research 
supervised  by  D,  McClure. 

Joyce  E.  Anderson,  Ph.D.  candidate,  1981-83,  research 
supervised  by  D.  McClure. 

Barry  R.  Davis,  Ph.D.  candidate,  1980-82,  research  supervised 
by  S.  Geman.  Ph.D.  requirements  completed  June  1982. 
Visiting  Assistant  Professor  at  Brown  University, 
1982-83. 

Aytul  Erdal,  Ph.D.  candidate,  1980-83,  research  supervised  by 
S.  Geman.  Ph.D.  requirements  completed  January  1983. 
Research  Associate  at  Brown  University,  1983. 
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Edmond  Nadler,  Ph.D.  candidate#  1981-83#  research  supervised  by 
D.  McClure. 

Charles  Plumeri#  Ph.D.  candidate#  1980-81#  research  supervised 
by  D.  McClure.  Ph.D.  requirements  completed  June 
1981. 

Shoulamit  Shwartz#  Ph.D.  candidate,  1980-83#  research 

supervised  by  D.  McClure.  Thesis  research  comppleted 
June  1983.  Degree  will  be  awarded  June  1984. 
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